useR! 2024, Salzburg, Austria
July 9, 2024
Data Quality Problems
Class overlap occurs when instances of more than one class share a common region in the data space and are not clearly separable
This overlap can happen due to:
Class overlap occurs when instances of more than one class share a common region in the data space and are not clearly separable
This overlap can happen due to:
Inherent Similarity: Natural similarity between classes
Noise: Variability or errors in data collection
Class overlap occurs when instances of more than one class share a common region in the data space and are not clearly separable
This overlap can happen due to:
Inherent Similarity: Natural similarity between classes
Noise: Variability or errors in data collection
Feature Representation: Insufficient or inadequate features to separate classes
Class overlap occurs when instances of more than one class share a common region in the data space and are not clearly separable
This overlap can happen due to:
Inherent Similarity: Natural similarity between classes
Noise: Variability or errors in data collection
Feature Representation: Insufficient or inadequate features to separate classes
It makes it challenging for classifiers to accurately distinguish between classes
Classifiers struggle to correctly classify instances due to overlapping regions
Higher error rates occur in areas where classes overlap, leading to more instances being misclassified
Classifiers struggle to correctly classify instances due to overlapping regions
Higher error rates occur in areas where classes overlap, leading to more instances being misclassified
If the problem of class overlap is not addressed, models may become overly complex, leading to overfitting issues where the model performs well on training data but poorly on unseen data
Slides created with Quarto, available at prital.netlify.app.